{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "id": "2eec5cc39a59"
   },
   "outputs": [],
   "source": [
    "# Copyright 2024 Google LLC\n",
    "#\n",
    "# Licensed under the Apache License, Version 2.0 (the \"License\");\n",
    "# you may not use this file except in compliance with the License.\n",
    "# You may obtain a copy of the License at\n",
    "#\n",
    "#     https://www.apache.org/licenses/LICENSE-2.0\n",
    "#\n",
    "# Unless required by applicable law or agreed to in writing, software\n",
    "# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
    "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
    "# See the License for the specific language governing permissions and\n",
    "# limitations under the License."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "1ecdfe45cea4"
   },
   "source": [
    "<h1 align=\"center\"> <a href=\"../README.md\">Vertex AI: Gemini Evaluations Playbook </a><br>\n",
    "Experiment, Evaluate, and Analyze</h1>\n",
    "\n",
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "49652b8c0a02"
   },
   "source": [
    "<table align=\"left\">\n",
    "  <td style=\"text-align: center\">\n",
    "    <a href=\"https://art-analytics.appspot.com/r.html?uaid=G-FHXEFWTT4E&utm_source=aRT-gemini_evals_playbook_evaluate-from_notebook-colab&utm_medium=aRT-clicks&utm_campaign=gemini_evals_playbook_evaluate-from_notebook-colab&destination=gemini_evals_playbook_evaluate-from_notebook-colab&url=https%3A%2F%2Fcolab.sandbox.google.com%2Fgithub%2FGoogleCloudPlatform%2Fapplied-ai-engineering-samples%2Fblob%2Fmain%2Fgenai-on-vertex-ai%2Fgemini%2Fevals_playbook%2Fnotebooks%2F1_gemini_evals_playbook_evaluate.ipynb\">\n",
    "      <img src=\"https://cloud.google.com/ml-engine/images/colab-logo-32px.png\" alt=\"Google Colaboratory logo\"><br> Run in Colab\n",
    "    </a>\n",
    "  </td>\n",
    "  <td style=\"text-align: center\">\n",
    "    <a href=\"https://art-analytics.appspot.com/r.html?uaid=G-FHXEFWTT4E&utm_source=aRT-gemini_evals_playbook_evaluate-from_notebook-colab_ent&utm_medium=aRT-clicks&utm_campaign=gemini_evals_playbook_evaluate-from_notebook-colab_ent&destination=gemini_evals_playbook_evaluate-from_notebook-colab_ent&url=https%3A%2F%2Fconsole.cloud.google.com%2Fvertex-ai%2Fcolab%2Fimport%2Fhttps%3A%252F%252Fraw.githubusercontent.com%252FGoogleCloudPlatform%252Fapplied-ai-engineering-samples%252Fmain%252Fgenai-on-vertex-ai%252Fgemini%252Fevals_playbook%252Fnotebooks%252F1_gemini_evals_playbook_evaluate.ipynb\">\n",
    "      <img width=\"32px\" src=\"https://lh3.googleusercontent.com/JmcxdQi-qOpctIvWKgPtrzZdJJK-J3sWE1RsfjZNwshCFgE_9fULcNpuXYTilIR2hjwN\" alt=\"Google Cloud Colab Enterprise logo\"><br> Run in Colab Enterprise\n",
    "    </a>\n",
    "  </td>    \n",
    "  <td style=\"text-align: center\">\n",
    "    <a href=\"https://art-analytics.appspot.com/r.html?uaid=G-FHXEFWTT4E&utm_source=aRT-gemini_evals_playbook_evaluate-from_notebook-github&utm_medium=aRT-clicks&utm_campaign=gemini_evals_playbook_evaluate-from_notebook-github&destination=gemini_evals_playbook_evaluate-from_notebook-github&url=https%3A%2F%2Fgithub.com%2FGoogleCloudPlatform%2Fapplied-ai-engineering-samples%2Fblob%2Fmain%2Fgenai-on-vertex-ai%2Fgemini%2Fevals_playbook%2Fnotebooks%2F1_gemini_evals_playbook_evaluate.ipynb\">\n",
    "      <img src=\"https://cloud.google.com/ml-engine/images/github-logo-32px.png\" alt=\"GitHub logo\"><br> View on GitHub\n",
    "    </a>\n",
    "  </td>\n",
    "  <td style=\"text-align: center\">\n",
    "    <a href=\"https://art-analytics.appspot.com/r.html?uaid=G-FHXEFWTT4E&utm_source=aRT-gemini_evals_playbook_evaluate-from_notebook-vai_workbench&utm_medium=aRT-clicks&utm_campaign=gemini_evals_playbook_evaluate-from_notebook-vai_workbench&destination=gemini_evals_playbook_evaluate-from_notebook-vai_workbench&url=https%3A%2F%2Fconsole.cloud.google.com%2Fvertex-ai%2Fworkbench%2Fdeploy-notebook%3Fdownload_url%3Dhttps%3A%2F%2Fraw.githubusercontent.com%2FGoogleCloudPlatform%2Fapplied-ai-engineering-samples%2Fblob%2Fmain%2Fgenai-on-vertex-ai%2Fgemini%2Fevals_playbook%2Fnotebooks%2F1_gemini_evals_playbook_evaluate.ipynb\">\n",
    "      <img src=\"https://lh3.googleusercontent.com/UiNooY4LUgW_oTvpsNhPpQzsstV5W8F7rYgxgGBD85cWJoLmrOzhVs_ksK_vgx40SHs7jCqkTkCk=e14-rj-sc0xffffff-h130-w32\" alt=\"Vertex AI logo\"><br> Open in Vertex AI Workbench\n",
    "    </a>\n",
    "  </td>\n",
    "</table>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "143ee20a6bca"
   },
   "source": [
    "# Evals Playbook: Experiment, Evaluate & Analyze\n",
    "\n",
    "This notebook shows you how to define experiments, run evaluations to assess model performance, and analyze evaluation results including side-by-side comparison of results across different experiments and runs. The notebook performs following steps:\n",
    "\n",
    "- Define the evaluation task\n",
    "- Prepare evaluation dataset\n",
    "- Define an experiment by:\n",
    "    - Configuring the model\n",
    "    - Setting prompt and system instruction\n",
    "    - Establishing evaluation criteria (metrics)\n",
    "- Run evaluations using [Vertex AI Rapid Eval SDK](https://cloud.google.com/vertex-ai/generative-ai/docs/models/rapid-evaluation)\n",
    "- Log detailed results and summarizing through aggregated metrics.\n",
    "- Side-by-side comparison of evaluation runs for a comprehensive analysis."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "0a5a0a12e5ca"
   },
   "source": [
    "## 🚧 0. Pre-requisites\n",
    "\n",
    "Make sure that you have prepared the environment following steps in [0_gemini_evals_playbook_setup.ipynb](0_gemini_evals_playbook_setup.ipynb). If the 0_gemini_evals_playbook_setup notebook has been run successfully, the following are set up:\n",
    "\n",
    "* GCP project and APIs to run the eval pipeline\n",
    "* All the required IAM permissions\n",
    "* Environment to run the notebooks\n",
    "* Bigquery datasets and tables to track evaluation results"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "id": "6bc4f990ca5b"
   },
   "outputs": [],
   "source": [
    "%load_ext autoreload\n",
    "%autoreload 2"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "edb768ddbc75"
   },
   "source": [
    "### Read configurations\n",
    "\n",
    "The configuration saved previously in [0_gemini_evals_playbook_setup.ipynb](0_gemini_evals_playbook_setup.ipynb) will be used for initializing variables."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "f2b473618d44"
   },
   "outputs": [],
   "source": [
    "import os\n",
    "import sys\n",
    "\n",
    "module_path = os.path.abspath(os.path.join(\"..\"))\n",
    "sys.path.append(module_path)\n",
    "print(f\"module_path: {module_path}\")\n",
    "\n",
    "# Import all the parameters\n",
    "from utils.config import (LOCATION, PROJECT_ID, STAGING_BUCKET,\n",
    "                          STAGING_BUCKET_URI)\n",
    "from utils.evals_playbook import Evals, generate_uuid"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "e236046a4da7"
   },
   "source": [
    "### Import libraries"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "id": "d13545946b71"
   },
   "outputs": [],
   "source": [
    "import datetime\n",
    "import itertools\n",
    "import re\n",
    "\n",
    "import pandas as pd\n",
    "import vertexai\n",
    "from datasets import Dataset, load_dataset\n",
    "from vertexai.evaluation import (EvalTask, PointwiseMetric,\n",
    "                                 PointwiseMetricPromptTemplate, constants)\n",
    "from vertexai.generative_models import (GenerativeModel, HarmBlockThreshold,\n",
    "                                        HarmCategory, SafetySetting)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "d02af5cfe722"
   },
   "source": [
    "### Initialize Vertex AI SDK"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "2c5cb3f27bd2"
   },
   "outputs": [],
   "source": [
    "vertexai.init(project=PROJECT_ID, location=LOCATION, staging_bucket=STAGING_BUCKET_URI)\n",
    "\n",
    "print(\"Vertex AI SDK initialized.\")\n",
    "print(f\"Vertex AI SDK version = {vertexai.__version__}\")\n",
    "\n",
    "# pandas display full column values\n",
    "pd.set_option(\"display.max_colwidth\", None)\n",
    "pd.set_option(\"display.max_rows\", None)\n",
    "pd.set_option(\"display.max_columns\", None)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "39ddfb1c94b2"
   },
   "source": [
    "### Define `Evals` object\n",
    "\n",
    "[`Evals`](../utils/evals_playbook.py) is a helper class helps to define tasks, experiments and log evaluation results. Define an instance of `Evals` class to use in the rest of the notebook."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "id": "231081b9d6d4"
   },
   "outputs": [],
   "source": [
    "# Initialize evals object\n",
    "evals = Evals()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "8b4289d909db"
   },
   "source": [
    "## 🛠️ 1. Define and configure evaluation task and experiment"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "1a1c5918a9c0"
   },
   "source": [
    "### Define Evaluation Task\n",
    "\n",
    "An evaluation task defines the task model(s) will be evaluated on. The `task_id` is analogous to a workspace to group experiments and corresponding evaluation runs. This notebook premises on summarization of [PubMed](https://pubmed.ncbi.nlm.nih.gov/) articles as the task."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "33fd82ba8f70"
   },
   "outputs": [],
   "source": [
    "# create and log task\n",
    "task_id = \"task_summarization\"\n",
    "task = evals.Task(\n",
    "    task_id=task_id,\n",
    "    task_desc=\"summarize pubmed articles\",\n",
    "    tags=[\"pubmed\"],\n",
    "    create_datetime=datetime.datetime.now(),\n",
    "    update_datetime=datetime.datetime.now(),\n",
    ")\n",
    "evals.log_task(task)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "6c9fa509e5ec"
   },
   "source": [
    "- List all tasks available in the database (lists tasks sorted by task creation time in descending order)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "414668273d5d"
   },
   "outputs": [],
   "source": [
    "evals.get_all_tasks()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "852fb88e8016"
   },
   "source": [
    "### Define Experiment\n",
    "\n",
    "An experiment in Evals Playbook is defined by configuring\n",
    "- Dataset\n",
    "- Model and model configuration\n",
    "- Prompt\n",
    "\n",
    "Each experiment has an `experiment_id` and associated with a `task_id`. This sectio defines the required components."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "54264de00830"
   },
   "source": [
    "<div class=\"alert alert-block alert-info\">\n",
    "<b>⚠️ We recommend to create unique experiment id for each experiment to enable better tracking and experimentation. ⚠️</b>\n",
    "</div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "id": "05f3a40020dc"
   },
   "outputs": [],
   "source": [
    "experiment_id = \"Prompt with simple language summary and custom metrics\"\n",
    "# remove any special characters from experiment id\n",
    "_experiment_id = re.sub(\"[^0-9a-zA-Z]\", \"-\", experiment_id.lower())\n",
    "experiment_desc = \"Update system instruction to generate a simple summary with bullets\"\n",
    "tags = [\"pubmed\"]\n",
    "metadata = {}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "2167e3eae619"
   },
   "source": [
    "#### Configure Model\n",
    "\n",
    "Define the Gemini model you want to evaluate your task on including name, configuration settings such as temperature and safety settings."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "47b858444e2f"
   },
   "source": [
    "- Add [system instructions](https://cloud.google.com/vertex-ai/generative-ai/docs/learn/prompts/system-instructions) to give the model additional context to understand the task, provide more customized responses, and adhere to specific guidelines over the full user interaction with the model."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "id": "abe2ba9b3069"
   },
   "outputs": [],
   "source": [
    "system_instruction = \"\"\"Instruction: You are a medical researcher writing a plain language Summary of your Article for a layperson.\n",
    "\n",
    "Translate any medical terms to simple english explanations.\n",
    "Use first-person 'We'.  Use short bullet points addressing following\n",
    "- Purpose: What was the purpose of the study?\n",
    "- Research: What did the researchers do?\n",
    "- Findings: What did they find?\n",
    "- Implications: What does this mean for me?\"\n",
    "\"\"\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "66c7b985c3c7"
   },
   "source": [
    "- Define generation config and safety settings"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "id": "225a6d5c6fdc"
   },
   "outputs": [],
   "source": [
    "generation_config = {\n",
    "    \"temperature\": 0.1,\n",
    "}\n",
    "\n",
    "safety_settings = [\n",
    "    SafetySetting(\n",
    "        category=HarmCategory.HARM_CATEGORY_DANGEROUS_CONTENT,\n",
    "        threshold=HarmBlockThreshold.BLOCK_NONE,\n",
    "    ),\n",
    "    SafetySetting(\n",
    "        category=HarmCategory.HARM_CATEGORY_HATE_SPEECH,\n",
    "        threshold=HarmBlockThreshold.BLOCK_NONE,\n",
    "    ),\n",
    "    SafetySetting(\n",
    "        category=HarmCategory.HARM_CATEGORY_HARASSMENT,\n",
    "        threshold=HarmBlockThreshold.BLOCK_NONE,\n",
    "    ),\n",
    "    SafetySetting(\n",
    "        category=HarmCategory.HARM_CATEGORY_SEXUALLY_EXPLICIT,\n",
    "        threshold=HarmBlockThreshold.BLOCK_NONE,\n",
    "    ),\n",
    "]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "id": "5b53dacae637"
   },
   "outputs": [],
   "source": [
    "model = GenerativeModel(\n",
    "    model_name=\"gemini-2.0-flash-001\",\n",
    "    generation_config=generation_config,\n",
    "    safety_settings=safety_settings,\n",
    "    system_instruction=system_instruction,\n",
    "    # TODO: Add tools and tool_config\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "c4eba228f97c"
   },
   "source": [
    "#### Prepare Prompt"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "a5707db6440e"
   },
   "source": [
    "- Prepare a prompt template for the experiment"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "9051a4b75e9a"
   },
   "outputs": [],
   "source": [
    "prompt_id = \"short bulleted list with format\"\n",
    "prompt_description = \"instruction with short bullets addressing specific questions\"\n",
    "\n",
    "# Prompt Template\n",
    "prompt_template = \"Article: {context} \\nSummary:\"\n",
    "\n",
    "evals.save_prompt_template(task_id, _experiment_id, prompt_id, prompt_template)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "9ebd502029af"
   },
   "source": [
    "- Configure prompt id, description for tracking"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "7218e4fad16d"
   },
   "outputs": [],
   "source": [
    "prompt = evals.Prompt(\n",
    "    prompt_id=prompt_id,\n",
    "    prompt_description=prompt_description,\n",
    "    prompt_type=\"single-turn\",  # single-turn, chat,\n",
    "    is_multimodal=False,\n",
    "    system_instruction=system_instruction,\n",
    "    prompt_template=prompt_template,\n",
    "    create_datetime=datetime.datetime.now(),\n",
    "    update_datetime=datetime.datetime.now(),\n",
    "    tags=tags,\n",
    ")\n",
    "evals.log_prompt(prompt)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "1b4184c4ca52"
   },
   "source": [
    "#### Prepare evaluation dataset\n",
    "\n",
    "This notebook uses a sample of [PubMed](https://pubmed.ncbi.nlm.nih.gov/) articles that are hosted on [HuggingFace](https://huggingface.co/datasets/ccdv/pubmed-summarization)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "548b40adccb0"
   },
   "source": [
    "- Download sample dataset (10 rows) of PubMed articles for the task."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {
    "id": "cec58355869d"
   },
   "outputs": [],
   "source": [
    "# get sample dataset from PubMed articles\n",
    "ds_stream = load_dataset(\n",
    "    \"ccdv/pubmed-summarization\", \"document\", split=\"test\", streaming=True\n",
    ")\n",
    "num_rows = 10\n",
    "dataset = Dataset.from_list(list(itertools.islice(ds_stream, num_rows)))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "d9d6134e8747"
   },
   "source": [
    "- Pre-process and prepare dataset to use with the evaluator.\n",
    "\n",
    "Prepare the dataset as Pandas dataframe in the format expected by the [Vertex AI Rapid Eval SDK](https://cloud.google.com/vertex-ai/generative-ai/docs/models/rapid-evaluation#dataset-prep).\n",
    "\n",
    "Dataset column names:\n",
    "- `reference`: The column name of ground truth in the dataset.\n",
    "- `context`: The column name containing article passed as the context.\n",
    "- `instruction`: System instruction configured to pass to the model\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {
    "id": "e286fa8ac68a"
   },
   "outputs": [],
   "source": [
    "# convert HuggingFace dataset to Pandas dataframe\n",
    "eval_dataset = dataset.to_pandas()\n",
    "# rename columns as per Vertex AI Rapid Eval SDK defaults\n",
    "eval_dataset.columns = [\"context\", \"reference\"]\n",
    "# add instruction for calculating metrics (not all metrics need instruction)\n",
    "eval_dataset[\"instruction\"] = system_instruction\n",
    "# add prompt column\n",
    "eval_dataset[\"prompt\"] = eval_dataset[\"context\"].apply(\n",
    "    lambda x: prompt_template.format(context=x)\n",
    ")\n",
    "# add prompt id for tracking\n",
    "eval_dataset[\"dataset_row_id\"] = [f\"dataset_row_{i}\" for i in eval_dataset.index]\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "3fb7dbc495c6"
   },
   "source": [
    "- Verify a few samples in the prepared evaluation dataset"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "afa2b2bca3d3"
   },
   "outputs": [],
   "source": [
    "print(f\"Number of rows: {eval_dataset.shape}\")\n",
    "eval_dataset.head(1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "694e0e136949"
   },
   "source": [
    "- Optionally, save the dataset in Cloud Storage (or BigQuery) to reuse."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "254452fb25cb"
   },
   "outputs": [],
   "source": [
    "file_name = \"pubmed_summary.csv\"\n",
    "gcs_file_path = f\"gs://{STAGING_BUCKET}/{task_id}/data/{file_name}\"\n",
    "# Save dataset to Cloud Storage\n",
    "eval_dataset.to_csv(gcs_file_path, index=False)\n",
    "print(f\"Dataset saved at {gcs_file_path} successfully!\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "a094cc0f9676"
   },
   "source": [
    "#### Configure Metrics\n",
    "\n",
    "In this section, you configure the evaluation criteria for your task. You can choose from the [built-in metrics (or metric bundles)](https://cloud.google.com/vertex-ai/generative-ai/docs/models/rapid-evaluation#metric-bundles) from Vertex AI Rapid Eval SDK or define a custom metric."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "b0f0eaa9a9e3"
   },
   "source": [
    "- Define prebuilt/built-in metrics with Vertex GenAI Evaluation or bring your own metrics."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {
    "id": "0fc8d6d16c2e"
   },
   "outputs": [],
   "source": [
    "# Creating custom metrics for Pointwise Evaluation;\n",
    "# You can define the metric following either a template of criteria and rating rubric\n",
    "# or using a free form prompt. One example for each is demonstrated below\n",
    "\n",
    "# Example 1: format adherence metric, to evaluate if the LLM strictly followed the required formatting\n",
    "criteria = {\n",
    "    \"First-person We\": \"The text is written in first person 'we'\",\n",
    "    \"Format\": \"The output is formatted in bullets\",\n",
    "    \"Completeness\": \"All four sections, purpose, research, findings and implications are addressed in the output\",\n",
    "}\n",
    "\n",
    "pointwise_rating_rubric = {\n",
    "    \"5\": \"Perfectly formatted: Text is in first person 'we', formatted in bullets and all four sections purpose, research, findings and implications are addressed in the output\",\n",
    "    \"4\": \"Mostly formatted: Content is formatted in bullets and all four sections purpose, research, findings and implications are addressed in the output, but failed to write in first person 'we' \",\n",
    "    \"3\": \"Somewhat formatted: Content is formatted in bullets and but failed to address one of the four sections purpose, research, findings and implications\",\n",
    "    \"2\": \"Poorly formatted : Content is may or may not be formatted in bullets and failed to address two out of the four sections purpose, research, findings and implications\",\n",
    "    \"1\": \"Very poorly formatted: Content is not formatted in bullets and failed to address two or more out of the four sections purpose, research, findings and implications\",\n",
    "}\n",
    "\n",
    "# The metric prompt template contains default prompts pre-defined for unspecified components.\n",
    "format_adherence_metric_prompt_template = PointwiseMetricPromptTemplate(\n",
    "    criteria=criteria,\n",
    "    rating_rubric=pointwise_rating_rubric,\n",
    "    input_variables=[\"prompt\", \"reference\"],\n",
    ")\n",
    "\n",
    "# Display the assembled prompt template that will be sent to Gen AI Eval Service\n",
    "# along with the input data for model-based evaluation.\n",
    "# print(format_adherence_metric_prompt_template.prompt_data)\n",
    "\n",
    "# Register the custom \"format_adherence\" model-based metric.\n",
    "format_adherence = PointwiseMetric(\n",
    "    metric=\"format_adherence\",\n",
    "    metric_prompt_template=format_adherence_metric_prompt_template,\n",
    ")\n",
    "\n",
    "\n",
    "# Example 2: text quality and relevance to layperson\n",
    "free_form_pointwise_metric_prompt = \"\"\"\n",
    "# Instruction\n",
    "You are an expert evaluator. Your task is to evaluate the quality of the responses generated by AI models.\n",
    "We will provide you with the user prompt and an AI-generated response.\n",
    "You should first read the user prompt carefully for analyzing the task, and then evaluate the \n",
    "quality of the responses based on and Criteria provided in the Evaluation section below.\n",
    "\n",
    "You will assign the response a score from 5, 4, 3, 2, 1, following the Rating Rubric and Evaluation Steps. \n",
    "Give step-by-step explanations for your scoring, and only choose scores from 5, 4, 3, 2, 1.\n",
    "\n",
    "# Evaluation\n",
    "## Metric Definition\n",
    "You will be assessing Text Quality and relevance to layperson, which measures how effectively the text conveys\n",
    "clear, accurate, and engaging information that is easily understandable by a layperson and directly addresses \n",
    "the user's prompt, considering factors like fluency, coherence, relevance, conciseness and free of \n",
    "complex medical language\n",
    "\n",
    "## Criteria\n",
    "Coherence: The response presents ideas in a logical and organized manner, with clear transitions and a consistent focus, making it easy to follow and understand.\n",
    "Fluency: The text flows smoothly and naturally, adhering to grammatical rules and using appropriate vocabulary.\n",
    "Relevance to layperson: The response is easily understandable by a layperson as opposed to a medical professional\n",
    "Groundedness: The response contains information included only in the context. The response does not reference any outside information.\n",
    "Verbosity: The response is appropriately concise, providing sufficient detail without using complex language to thoroughly address the prompt without being overly wordy or excessively brief.\n",
    "\n",
    "## Rating Rubric\n",
    "5: (Very good). Exceptionally clear, coherent, fluent, and concise. Free of complex Medical language\n",
    "4: (Good). Well-written, coherent, and fluent. Easy to understand by a layperson. Minor room for improvement.\n",
    "3: (Ok). Adequate writing with decent coherence and fluency. May contain some medical jargon and minor ungrounded information. Could be more concise.\n",
    "2: (Bad). Poorly written, lacking coherence and fluency. Geared towards to medical professional as opposed to layperson. May include ungrounded information. \n",
    "1: (Very bad). Very poorly written, incoherent, and non-fluent. Geared towards to medical professional as opposed to layperson. Contains substantial ungrounded information. Severely lacking in conciseness.\n",
    "\n",
    "## Evaluation Steps\n",
    "STEP 1: Assess the response in aspects of all criteria provided. Provide assessment according to each criterion.\n",
    "STEP 2: Score based on the rating rubric. Give a brief rationale to explain your evaluation considering each individual criterion.\n",
    "\n",
    "# User Inputs and AI-generated Response\n",
    "## User Inputs\n",
    "### Prompt\n",
    "{prompt}\n",
    "\n",
    "## AI-generated Response\n",
    "{reference}\n",
    "\"\"\"\n",
    "\n",
    "# Register the custom \"text_quality_relevance_to_layperson\" model-based metric.\n",
    "text_quality_relevance_to_layperson = PointwiseMetric(\n",
    "    metric=\"text_quality_relevance_to_layperson\",\n",
    "    metric_prompt_template=free_form_pointwise_metric_prompt,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "8a9bd14eef6a"
   },
   "source": [
    "For a full list of built in metrics:\n",
    "\n",
    "* **Computation-based:** [https://cloud.google.com/vertex-ai/generative-ai/docs/models/determine-eval#computation-based-metrics](https://cloud.google.com/vertex-ai/generative-ai/docs/models/determine-eval#computation-based-metrics)\n",
    "* **Model-based:** [https://cloud.google.com/vertex-ai/generative-ai/docs/models/determine-eval#model-based-metrics](https://cloud.google.com/vertex-ai/generative-ai/docs/models/determine-eval#model-based-metrics) \n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "d43161710b07"
   },
   "outputs": [],
   "source": [
    "# List of built in metrics\n",
    "metrics = [\n",
    "    constants.Metric.ROUGE_1,\n",
    "    constants.Metric.ROUGE_L_SUM,\n",
    "    constants.Metric.BLEU,\n",
    "    constants.Metric.FLUENCY,\n",
    "    constants.Metric.COHERENCE,\n",
    "    constants.Metric.SAFETY,\n",
    "    constants.Metric.GROUNDEDNESS,\n",
    "    constants.Metric.SUMMARIZATION_QUALITY,\n",
    "]\n",
    "\n",
    "# build a metric config object for tracking\n",
    "# Add built in metrics\n",
    "metric_config = [\n",
    "    {\"metric_name\": metric, \"type\": \"prebuilt\", \"metric_scorer\": \"Vertex AI\"}\n",
    "    for metric in metrics\n",
    "]\n",
    "\n",
    "# Add custom metrics\n",
    "metric_config.extend(\n",
    "    [\n",
    "        {\n",
    "            \"metric_name\": text_quality_relevance_to_layperson.metric_name,\n",
    "            \"type\": \"custom\",\n",
    "            \"metric_scorer\": \"Vertex AI\",\n",
    "        },\n",
    "        {\n",
    "            \"metric_name\": format_adherence.metric_name,\n",
    "            \"type\": \"custom\",\n",
    "            \"metric_scorer\": \"Vertex AI\",\n",
    "        },\n",
    "    ]\n",
    ")\n",
    "\n",
    "metrics.extend([text_quality_relevance_to_layperson, format_adherence])\n",
    "\n",
    "print(metric_config)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "a11513023fb1"
   },
   "source": [
    "#### Add Experiment\n",
    "\n",
    "Now that you have defined model, prompt, dataset and eval criteria (metrics), let's add them to an experiment and start logging."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "3e4f395d2226"
   },
   "outputs": [],
   "source": [
    "experiment = evals.log_experiment(\n",
    "    task_id=task_id,\n",
    "    experiment_id=experiment_id,\n",
    "    experiment_desc=experiment_desc,\n",
    "    prompt=prompt,\n",
    "    model=model,\n",
    "    metric_config=metric_config,\n",
    "    tags=tags,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "240ca9e38331"
   },
   "source": [
    "- You can view the experiment details"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "9b68eb6ed06d"
   },
   "outputs": [],
   "source": [
    "evals.get_experiment(experiment_id=experiment_id)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "62d08cb837b7"
   },
   "source": [
    "- You can view the prompt and system instruction if set."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "fc92e272341c"
   },
   "outputs": [],
   "source": [
    "evals.get_prompt(prompt_id=prompt_id)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "f0d87313b494"
   },
   "source": [
    "- List all experiments available"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "bfe5253e4b46"
   },
   "outputs": [],
   "source": [
    "evals.get_all_experiments()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "ef2af3ad7255"
   },
   "source": [
    "## 🚀 2. Run experiment(s) for an evaluation task\n",
    "\n",
    "The experiment is now ready to run an evaluation task using the model, prompt, dataset and metrics configured."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "34b606ca956e"
   },
   "source": [
    "- Define [Vertex AI Rapid Eval Task](https://cloud.google.com/vertex-ai/generative-ai/docs/models/rapid-evaluation#evaluation-task). Evaluation tasks must contain an evaluation dataset, and a list of metrics to evaluate."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {
    "id": "eded26dd5f4a"
   },
   "outputs": [],
   "source": [
    "_experiment_id = re.sub(\"[^0-9a-zA-Z]\", \"-\", experiment_id.lower())\n",
    "eval_task = EvalTask(dataset=eval_dataset, metrics=metrics, experiment=_experiment_id)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "4432cee5d0d9"
   },
   "source": [
    "- Run the evaluation task with a run name, model and prompt template. This step may take a few minutes depending on the size of evaluation dataset.\n",
    "\n",
    "<div class=\"alert alert-block alert-info\">\n",
    "<b>⚠️ A unique experiment run name is auto-generated based on experiment id. ⚠️</b>\n",
    "</div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "a67a67a1f266"
   },
   "outputs": [],
   "source": [
    "experiment_run_name = generate_uuid(_experiment_id)\n",
    "eval_result = eval_task.evaluate(\n",
    "    model=model,\n",
    "    prompt_template=prompt_template,\n",
    "    experiment_run_name=experiment_run_name,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "5b903b035cfd"
   },
   "source": [
    "- After the evaluation task is completed, Vertex AI Rapid Eval SDK returns the result of the  run including summary metrics and a detailed metrics table with per-instance (that is per example) metrics."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {
    "id": "0c8f0084cd7b"
   },
   "outputs": [],
   "source": [
    "summary_metrics = eval_result.summary_metrics\n",
    "report_df = eval_result.metrics_table"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "c678153ab4da"
   },
   "outputs": [],
   "source": [
    "report_df.head(1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "a4a70c24c300"
   },
   "outputs": [],
   "source": [
    "summary_metrics"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "3b821d981495"
   },
   "source": [
    "- Log the run metrics (both summary and detail) to analyze or compare them in subsequent iterations."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "9f22ae37088d"
   },
   "outputs": [],
   "source": [
    "run_path = f\"{task_id}/prompts/{_experiment_id}/{experiment_run_name}\"\n",
    "evals.log_eval_run(\n",
    "    experiment_run_id=experiment_run_name,\n",
    "    experiment=experiment,\n",
    "    eval_result=eval_result,\n",
    "    run_path=run_path,\n",
    "    tags=tags,\n",
    "    metadata=metadata,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "6e2257a32bdc"
   },
   "source": [
    "- View all evaluation runs for an experiment"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "ba04b1178a39"
   },
   "outputs": [],
   "source": [
    "evals.get_eval_runs(experiment_id=experiment_id)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "06af3ef1b496"
   },
   "source": [
    "- View all evaluation runs in the system across experiments"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "2a67328f54db"
   },
   "outputs": [],
   "source": [
    "evals.get_all_eval_runs()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "3b3e39d70059"
   },
   "source": [
    "## 📊 3. Analyze results\n",
    "\n",
    "This section shows a few ways to analyze and compare results. Since the results are stored in BigQuery tables, there are multiple ways to analyze them\n",
    "\n",
    "1. Use BigQuery SQL queries\n",
    "2. Use Pandas dataframe and BigQuery\n",
    "3. Build Looker dashboards\n",
    "4. Use tools such as [LLM Comparator](https://medium.com/people-ai-research/llm-comparator-a-tool-for-human-driven-llm-evaluation-81292c17f521) from Google's PAIR team\n",
    "- and more ..."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "1c08c3f65b77"
   },
   "source": [
    "### Get experiments, runs and run details"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "28d1446c721e"
   },
   "source": [
    "- Define `Evals` object to access helper functions"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "metadata": {
    "id": "e5145dc0520c"
   },
   "outputs": [],
   "source": [
    "evals = Evals()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "ea19ad7349c6"
   },
   "source": [
    "- Get all experiments"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "34e484e2d980"
   },
   "outputs": [],
   "source": [
    "evals.get_all_experiments()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "0829d626e00c"
   },
   "source": [
    "- Get a specific experiment using `experiment_id`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "e872bf85e1ac"
   },
   "outputs": [],
   "source": [
    "experiment_id = \"Prompt with simple language summary\"\n",
    "evals.get_experiment(experiment_id=experiment_id)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "d64c20fa3f92"
   },
   "source": [
    "### Basic analysis\n",
    "\n",
    "#### Summary metrics\n",
    "\n",
    "Compare all runs for a given experiment at a summary level. This can be useful, when you run the same experiment at different time snapshots and allow you to see if there is any variance or change in eval metrics (how robust the model is)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "8b2231b885c1"
   },
   "outputs": [],
   "source": [
    "evals.get_eval_runs(experiment_id=experiment_id)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "95faac8baa46"
   },
   "source": [
    "#### Detailed metrics\n",
    "\n",
    "You can get a detail eval result for a given experiment run at example level. This helps you to analyze and identify any loss patterns. To find run_id for previous runs, see gemini_evals_plapbook(schema) >> eval_runs(table) >> run_id (column) on bigquery"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "5a60ddec0ca8"
   },
   "outputs": [],
   "source": [
    "# Replace  \n",
    "experiment_run_id = \"[your-run_id]\"\n",
    "evals.get_eval_run_detail(experiment_run_id=experiment_run_id)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "9d58865391e4"
   },
   "source": [
    "### Compare eval runs across experiments"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "a55c76680c48"
   },
   "source": [
    "#### Compare eval runs at summary level\n",
    "\n",
    "You can compare summary metrics for multiple runs side-by-side even across different experiments. For example, you can compare eval runs \n",
    "- For the same prompt at different temperature settings\n",
    "- Same model setting but different prompt templates or system instruction\n",
    "\n",
    "\n",
    "Pass a list of experiment run ids and compare them side-by-side"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "79864838ffab"
   },
   "outputs": [],
   "source": [
    "run_ids = [\n",
    "    \"[your-run_id1]\",\n",
    "    \"[your-run_id2]\",\n",
    "]\n",
    "# list of run ids - strings\n",
    "evals.compare_eval_runs(run_ids)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "c2c5f5a3a04b"
   },
   "source": [
    "#### LLM Comparator for analyzing side-by-side LLM evaluation results\n",
    "\n",
    "To visualize model responses from different runs, we use [LLM Comparator](https://github.com/PAIR-code/llm-comparator) Python Library from [Google PAIR team](https://pair.withgoogle.com/) to compare model responses from two runs side-by-side. The tool coordinates the three phases of comparative evaluation: judging, bulletizing, and clustering and the results can be uploaded on [LLM Comparator app](https://pair-code.github.io/llm-comparator/) to view and analyze further."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "4e98afc0cf32"
   },
   "source": [
    "- Fetch run details for two experiment run ids you would like to compare. \n",
    "\n",
    "<div class=\"alert alert-block alert-info\">\n",
    "<b> Use <code>evals.get_all_eval_runs()</code> or <code>evals.get_eval_runs(experiment_id=experiment_id)</code> to get run ids.</b>\n",
    "</div>\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 43,
   "metadata": {
    "id": "0b126c18151b"
   },
   "outputs": [],
   "source": [
    "# Prepare run details to compare\n",
    "# @markdown ### Enter experiment run id 1\n",
    "run_1 = \"[your-run_id1]\"  # @param {type:\"string\"}\n",
    "run_1_details = evals.get_eval_run_detail(experiment_run_id=run_1)\n",
    "run_1_details = run_1_details[\n",
    "    [\"run_id\", \"dataset_row_id\", \"input_prompt_gcs_uri\", \"output_text\"]\n",
    "]\n",
    "\n",
    "# @markdown ### Enter experiment run id 2\n",
    "run_2 = \"[your-run_id2]\"  # @param {type:\"string\"}\n",
    "run_2_details = evals.get_eval_run_detail(experiment_run_id=run_2)\n",
    "run_2_details = run_2_details[\n",
    "    [\"run_id\", \"dataset_row_id\", \"input_prompt_gcs_uri\", \"output_text\"]\n",
    "]\n",
    "\n",
    "run1_run2 = pd.merge(\n",
    "    run_1_details,\n",
    "    run_2_details,\n",
    "    how=\"outer\",\n",
    "    on=[\"dataset_row_id\"],\n",
    "    suffixes=(\"_1\", \"_2\"),\n",
    ")\n",
    "run1_run2 = run1_run2.rename(\n",
    "    columns={\n",
    "        \"input_prompt_gcs_uri_1\": \"prompt\",\n",
    "        \"output_text_1\": \"response_a\",\n",
    "        \"output_text_2\": \"response_b\",\n",
    "    }\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "f299372d3fb1"
   },
   "source": [
    "- Prepare pairwise comparison file to visualize using LLM Comparator"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "8c9ee974cf5e"
   },
   "outputs": [],
   "source": [
    "from llm_comparator import (comparison, llm_judge_runner, model_helper,\n",
    "                            rationale_bullet_generator,\n",
    "                            rationale_cluster_generator)\n",
    "\n",
    "inputs = run1_run2.to_dict(orient=\"records\")\n",
    "\n",
    "custom_fields_schema = [\n",
    "    {\"name\": \"prompt_id\", \"type\": \"string\"},\n",
    "]\n",
    "\n",
    "# Initialize the models-calling classes.\n",
    "generator = model_helper.VertexGenerationModelHelper(model_name=\"gemini-2.0-flash-001\")\n",
    "embedder = model_helper.VertexEmbeddingModelHelper()\n",
    "\n",
    "# Initialize the instances that run work on the models.\n",
    "judge = llm_judge_runner.LLMJudgeRunner(generator)\n",
    "bulletizer = rationale_bullet_generator.RationaleBulletGenerator(generator)\n",
    "clusterer = rationale_cluster_generator.RationaleClusterGenerator(generator, embedder)\n",
    "\n",
    "# Configure and run the comparative evaluation.\n",
    "comparison_result = comparison.run(\n",
    "    inputs, judge, bulletizer, clusterer, judge_opts={\"num_repeats\": 2}\n",
    ")\n",
    "\n",
    "# Write the results to a JSON file that can be loaded in\n",
    "# https://pair-code.github.io/llm-comparator\n",
    "file_path = \"assets/run1_run2_compare.json\"\n",
    "comparison.write(comparison_result, file_path)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "27948c235371"
   },
   "source": [
    "- You can now upload this file on LLM Comparator tool/app at https://pair-code.github.io/llm-comparator/ and analyze the results. Refer to [documentation](https://github.com/PAIR-code/llm-comparator/tree/main?tab=readme-ov-file#using-llm-comparator) on how to use the tool.\n",
    "\n",
    "![LLM Comparator results](assets/llm_comparator_results.jpg)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "563ce74f17d9"
   },
   "source": [
    "Based on the analysis, you can identify loss patterns and seed idea for next experiment. For example, changing prompt template, system instruction or model configuration. [Add a new experiment](#️-1-define-and-configure-evaluation-task-and-experiment) and run evaluations until you meet the success criteria for the evaluation task."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "29ceaa591970"
   },
   "source": [
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "823a22721594"
   },
   "source": [
    "## 🧹 Cleaning up\n",
    "\n",
    "Uncomment the following cells to clean up resources created as part of the Evals Playbook."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "0c6593ba81a6"
   },
   "outputs": [],
   "source": [
    "# # Delete BigQuery Dataset using bq utility\n",
    "# ! bq rm -r -f -d {BQ_DATASET_ID}\n",
    "\n",
    "# # Delete GCS bucket\n",
    "# ! gcloud storage rm --recursive {STAGING_BUCKET_URI}"
   ]
  }
 ],
 "metadata": {
  "colab": {
   "name": "1_gemini_evals_playbook_evaluate.ipynb",
   "toc_visible": true
  },
  "environment": {
   "kernel": "python3",
   "name": "workbench-notebooks.m128",
   "type": "gcloud",
   "uri": "us-docker.pkg.dev/deeplearning-platform-release/gcr.io/workbench-notebooks:m128"
  },
  "kernelspec": {
   "display_name": "Python 3",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.14"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
